AITopics | conjugate gradient

Collaborating Authors

conjugate gradient

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Regularized Nonlinear Acceleration

Damien Scieur, Alexandre d'Aspremont, Francis Bach

Neural Information Processing SystemsApr-30-2026, 22:39:10 GMT

We describe a convergence acceleration technique for generic optimization problems. Our scheme computes estimates of the optimum from a nonlinear average of the iterates produced by any optimization method. The weights in this average are computed via a simple and small linear system, whose solution can be updated online. This acceleration scheme runs in parallel to the base algorithm, providing improved estimates of the solution on the fly, while the original optimization method is running. Numerical experiments are detailed on classical classification problems.

algorithm, artificial intelligence, optimization problem, (16 more...)

Neural Information Processing Systems

Country: Europe > France (0.15)

Genre: Research Report (0.47)

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)

Add feedback

2 ProbabilisticLinearSolvers

Neural Information Processing SystemsFeb-8-2026, 08:25:19 GMT

Machine learning poses specific challenges for the solution of such systems due to their scale, characteristic structure, stochasticity and the central role of uncertainty in thefield.

artificial intelligence, calibration, machine learning, (17 more...)

Neural Information Processing Systems

Country:

Europe > Germany > Baden-Württemberg > Tübingen Region > Tübingen (0.14)
North America > Panama (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > Spain > Andalusia > Cádiz Province > Cadiz (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.47)
Information Technology > Artificial Intelligence > Machine Learning (0.36)

Add feedback

Scalable Principal-Agent Contract Design via Gradient-Based Optimization

Galanti, Tomer, Bookseller, Aarya, Ray, Korok

arXiv.org Artificial IntelligenceOct-27-2025

We study a bilevel \emph{max-max} optimization framework for principal-agent contract design, in which a principal chooses incentives to maximize utility while anticipating the agent's best response. This problem, central to moral hazard and contract theory, underlies applications ranging from market design to delegated portfolio management, hedge fund fee structures, and executive compensation. While linear-quadratic models such as Holmstr"om-Milgrom admit closed-form solutions, realistic environments with nonlinear utilities, stochastic dynamics, or high-dimensional actions generally do not. We introduce a generic algorithmic framework that removes this reliance on closed forms. Our method adapts modern machine learning techniques for bilevel optimization -- using implicit differentiation with conjugate gradients (CG) -- to compute hypergradients efficiently through Hessian-vector products, without ever forming or inverting Hessians. In benchmark CARA-Normal (Constant Absolute Risk Aversion with Gaussian distribution of uncertainty) environments, the approach recovers known analytical optima and converges reliably from random initialization. More broadly, because it is matrix-free, variance-reduced, and problem-agnostic, the framework extends naturally to complex nonlinear contracts where closed-form solutions are unavailable, such as sigmoidal wage schedules (logistic pay), relative-performance/tournament compensation with common shocks, multi-task contracts with vector actions and heterogeneous noise, and CARA-Poisson count models with $\mathbb{E}[X\mid a]=e^{a}$. This provides a new computational tool for contract design, enabling systematic study of models that have remained analytically intractable.

artificial intelligence, fixed parameter, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2510.21177

Country: North America > United States > Texas (0.14)

Genre: Research Report (0.64)

Industry: Banking & Finance > Trading (0.68)

Technology:

Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (0.88)

Add feedback

A fast, universal algorithm to learn parametric nonlinear embeddings

Miguel A. Carreira-Perpinan, Max Vladymyrov

Neural Information Processing SystemsOct-1-2025, 23:26:36 GMT

Neural Information Processing Systems http://nips.cc/

algorithm, gradient, mapping, (17 more...)

Neural Information Processing Systems

Country: North America > United States > California > Merced County > Merced (0.04)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.94)

Add feedback

Export Reviews, Discussions, Author Feedback and Meta-Reviews

Neural Information Processing SystemsAug-19-2025, 07:07:45 GMT

Submitted by Assigned_Reviewer_1 Q1 The authors propose a flexible and interpretable kernel (the CSM kernel), building on spectral mixture kernels, for learning relationships between multiple tasks. The starting point is to use Gaussian processes with 1 component spectral mixture kernels as the basis functions in a linear model of coregionalisation (SM-LMC). However, SM-LMC does not contain information about the phases between channels. Thus the authors propose the cross spectral mixture kernel, which mixes phase shifted versions of spectral mixture kernels across channels. The resulting kernel is interpretable and flexible.

application, kernel, spectral mixture kernel, (14 more...)

Neural Information Processing Systems

Country: North America > Canada > Quebec > Montreal (0.04)

Industry: Health & Medicine (0.32)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Add feedback

Fast Gaussian process inference by exact Matérn kernel decomposition

Langrené, Nicolas, Warin, Xavier, Gruet, Pierre

arXiv.org Machine LearningAug-5-2025

To speed up Gaussian process inference, a number of fast kernel matrix-vector multiplication (MVM) approximation algorithms have been proposed over the years. In this paper, we establish an exact fast kernel MVM algorithm based on exact kernel decomposition into weighted empirical cumulative distribution functions, compatible with a class of kernels which includes multivariate Matérn kernels with half-integer smoothness parameter. This algorithm uses a divide-and-conquer approach, during which sorting outputs are stored in a data structure. We also propose a new algorithm to take into account some linear fixed effects predictor function. Our numerical experiments confirm that our algorithm is very effective for low-dimensional Gaussian process inference problems with hundreds of thousands of data points. An implementation of our algorithm is available at https://gitlab.com/warin/fastgaussiankernelregression.git.

artificial intelligence, kernel, machine learning, (14 more...)

arXiv.org Machine Learning

2508.01864

Country:

Asia > China > Guangdong Province > Zhuhai (0.04)
Europe > Switzerland > Zürich > Zürich (0.04)
Asia > Middle East > Jordan (0.04)
(2 more...)

Genre: Research Report (0.81)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.88)

Add feedback

Comparing regularisation paths of (conjugate) gradient estimators in ridge regression

Hucker, Laura, Reiß, Markus, Stark, Thomas

arXiv.org Machine LearningMar-10-2025

We consider standard gradient descent, gradient flow and conjugate gradients as iterative algorithms for minimizing a penalized ridge criterion in linear regression. While it is well known that conjugate gradients exhibit fast numerical convergence, the statistical properties of their iterates are more difficult to assess due to inherent nonlinearities and dependencies. On the other hand, standard gradient flow is a linear method with well known regularizing properties when stopped early. By an explicit non-standard error decomposition we are able to bound the prediction error for conjugate gradient iterates by a corresponding prediction error of gradient flow at transformed iteration indices. This way, the risk along the entire regularisation path of conjugate gradient iterations can be compared to that for regularisation paths of standard linear methods like gradient flow and ridge regression. In particular, the oracle conjugate gradient iterate shares the optimality properties of the gradient flow and ridge regression oracles up to a constant factor. Numerical examples show the similarity of the regularisation paths in practice.

gradient flow, regression, regularisation path, (15 more...)

arXiv.org Machine Learning

2503.05542

Country:

Europe > Austria > Vienna (0.14)
Europe > Russia (0.04)
Europe > Netherlands > South Holland > Dordrecht (0.04)
(2 more...)

Genre: Research Report (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.82)

Add feedback

Review for NeurIPS paper: Faster Randomized Infeasible Interior Point Methods for Tall/Wide Linear Programs

Neural Information Processing SystemsJan-25-2025, 01:12:52 GMT

Weaknesses: This paper provides a nice advance in the theory of infeasible-start long-step IPMs, however the novelty of the approach taken and the relation of the work in the paper to prior work could use further clarity. First, solving regression problems in an A in nearly linear time, when A has many more rows than columns has been the subject of a line of research, e.g. These results, including ones based on the subspace embedding result used in this paper, readily extend to solving linear systems in A T A and this has been used by the Theoretical Computer Science papers mentioned for implementing short step IPMs. Consequently, I think it would have been beneficial to state earlier that the paper is using the known linear system solving machinery of subspace embeddings to build preconditioners (rather than just saying that "Randomized Linear Algebra" is used) and put this in the context of prior work. There may be novelty in the particular way in which the paper is using conjugate gradient and subspace embeddings, however the paper would be strengthened if it articulated how this is different than this previous literature; as the appendix points out, conjugate gradient can be replaced with other iterative methods which possibly puts the approach considered closer to the ones from the literature. In light of the previous paragraph, I think more of the novelty in the paper may lie in exactly how they handle the error from approximate linear system solves in a way sensitive to the design of the preconditioner.

conjugate gradient, randomized infeasible interior point method, tall wide linear program, (9 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Representation & Reasoning > Mathematical & Statistical Methods (0.72)

Add feedback

Review for NeurIPS paper: Probabilistic Linear Solvers for Machine Learning

Neural Information Processing SystemsJan-24-2025, 03:57:33 GMT

Strengths: EDIT after rebuttal: Thank you authors for clarifying the following: - GP regression on log(Rayleigh_i): satisfactory reply, this algorithm takes into account uncertainty about eigenvalues beyond t 1. - Transfer learning: reusing the posterior covariance as a prior makes the method converge faster than if just the mean is reused. I'm still confused about this, but a little bit less: - Empirical Bayes: is indeed common, and in many applications the prior is updated as more data comes in. For example, in Bayesian optimization, after acquiring an extra point the GP hyperparameters are re-optimized. However, the weird thing here, which the authors clarified in the rebuttal, is that the prior used at each time step *contains future observations in it*. Does this imply that the posterior covariance is impossible to calculate in the middle of the algorithm, before it is terminated and thus we have the full S matrix?

machine learning, neurips paper, probabilistic linear solver, (5 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.89)

Add feedback

Conjugate-Gradient-like Based Adaptive Moment Estimation Optimization Algorithm for Deep Learning

Tian, Jiawu, Xu, Liwei, Zhang, Xiaowei, Li, Yongqi

arXiv.org Artificial IntelligenceMay-11-2024

These authors contributed equally to this work. Abstract Training deep neural networks is a challenging task. In order to speed up training and enhance the performance of deep neural networks, we rectify the vanilla conjugate gradient as conjugate-gradient-like and incorporate it into the generic Adam, and thus propose a new optimization algorithm named CG-like-Adam for deep learning. Specifically, both the first-order and the second-order moment estimation of generic Adam are replaced by the conjugate-gradient-like. Convergence analysis handles the cases where the exponential moving average coefficient of the first-order moment estimation is constant and the first-order moment estimation is unbiased. Numerical experiments show the superiority of the proposed algorithm based on the CIFAR10/100 dataset. Introduction Deep learning has been used in many aspects, such as recommendation systems [1], natural language processing [2], image recognition [3], reinforcement learning [4], etc. Neural network model is the main research object of deep learning, which includes input layer, hidden layer and output layer. Each layer includes a certain number of neurons, and each neuron is connected with each other in a certain way. The parameters and connection parameters of each neuron determine the performance of the deep learning model.

algorithm, cg-like-adam, tnull null null null 2null, (14 more...)

arXiv.org Artificial Intelligence

2404.01714

Country:

North America > Canada > Ontario > Toronto (0.14)
Europe > Russia (0.04)
Asia > Russia (0.04)
Asia > China (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback